A very quick introduction to looping in R
Jim Rose
Oct 26 2022
A loop is set of instructions which allows you to apply one or more custom code operations repeatedly throughout a predetermine number of cycles (aka loops)
https://contest-server.cs.uchicago.edu
Iterator or counter
Code to be run repeatedly
Exit condition
Let’s say you wanted to find the mean and standard deviation of each row in this matrix
First create an empty output object
A object to hold the output must be created OUTSIDE of the loop first or else the loop will cause an error
The counter variable can be named anything you like
R will iterate through whatever you declare after the in operator and assign it to the counter variable
[1] "Vhagar the fearsome"
[1] "Caraxes the fearsome"
[1] "Syrax the fearsome"
[1] "Meleys the fearsome"
Now let’s use these same calculations to create a z-scored version of the orignal matrix
# First create empty output object
zscored <- matrix(nrow=nrow(mymatrix), ncol=ncol(mymatrix))
for (i in 1:nrow(mymatrix)){
#Then loop through rows using i to calculate the row statistics
mean <- mean(mymatrix[i,])
sd <- sd(mymatrix[i,])
#Then use these values to normalize each entry in the matrix
for (j in 1:ncol(mymatrix)){
zscored[i,j] <- (mymatrix[i,j] - mean)/sd
}
}
zscoredThis is called a Nested Loop. You need two different counter variables: i and j
Now let’s use these same calculations to create a z-scored version of the orignal matrix
[,1] [,2] [,3] [,4] [,5]
[1,] -0.3864956 -1.1522649 1.5124925 0.3558644 -0.3295964
[2,] -0.5274012 0.3137401 0.7238038 -1.4735541 0.9634114
[3,] 1.4812730 -1.0907623 -0.3074597 0.4674380 -0.5504890
Loops can be combined with custom functions to improve simplicity and readability of your code
Loops can be combined with custom functions to improve simplicity and readability of your code
[,1] [,2] [,3] [,4] [,5]
[1,] -0.3864956 -1.1522649 1.5124925 0.3558644 -0.3295964
[2,] -0.5274012 0.3137401 0.7238038 -1.4735541 0.9634114
[3,] 1.4812730 -1.0907623 -0.3074597 0.4674380 -0.5504890
Functions are nearly always faster than loops
library(tictoc)
myBIGmatrix <- matrix(rnorm(1500, mean=10, sd=3),
nrow=300, ncol=500
)
BIGzscored <- matrix(nrow=nrow(myBIGmatrix), ncol=ncol(myBIGmatrix))
tic()
for (i in 1:nrow(myBIGmatrix)){
#First calculate the row statistics
mean <- mean(myBIGmatrix[i,])
sd <- sd(myBIGmatrix[i,])
#Then use these values to normalize each entry in the matrix
for (j in 1:ncol(myBIGmatrix)){
BIGzscored[i,j] <- (myBIGmatrix[i,j] - mean)/sd
}
}
toc()0.035 sec elapsed
Vectorized functions save compute time with large datasets, but the difference is minimal if you are not dealing with a lot of data points.
See here for more detailed info on writing loops in R